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Abstract 

Time-of-Flight (ToF) depth sensing camera is able to ob¬ 
tain depth maps at a high frame rate. However, its low res¬ 
olution and sensitivity to the noise are always a concern. A 
popular solution is upsampling the obtained noisy low res¬ 
olution depth map with the guidance of the companion high 
resolution color image. However, due to the constrains in 
the existing upsampling models, the high resolution depth 
map obtained in such way may suffer from either texture 
copy artifacts or blur of depth discontinuity. In this paper, 
a novel optimization framework is proposed with the brand 
new data term and smoothness term. The comprehensive 
experiments using both synthetic data and real data show 
that the proposed method well tackles the problem of tex¬ 
ture copy artifacts and blur of depth discontinuity. It also 
demonstrates sufficient robustness to the noise. Moreover, a 
data driven scheme is proposed to adaptively estimate the 
parameter in the upsampling optimization framework. The 
encouraging performance is maintained even in the case of 
large upsampling e.g. 8x and 16 x. 


1. Introduction 

Acquiring the depth information of 3D scenes is essen¬ 
tial for many applications in computer vision and graphics 
such as 3D modeling, 3DTV and augmented reality. Re¬ 
cently Time-of-Flight (ToF) cameras have shown impres¬ 
sive results and become more and more popular, e.g., Kinect 
v2.0 sensor. They can obtain dense depth measurements at 
a high frame rate. However, the resolution is generally very 
low and the depth map is often corrupted by strong noise. 

Tremendous efforts have been put for improving the res¬ 
olution of depth maps acquired by ToF cameras. The so¬ 
lutions usually go to three categories. In the first cate¬ 
gory, single low resolution depth map is upsampled through 
different data-driven approaches. It may be achieved by 
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Figure 1. 8x upsampling result patches of Art from the Middle- 
burry dataset [16]. (a) The noisy low resolution depth map patch 
and the corresponding color image, (b) The ground truth, (c) The 
upsampling result by the state-of-art method in [2]. The upsam¬ 
pling result by our method (d) without adaptive bandwidth se¬ 
lection and (e) with adaptive bandwidth selection, (f) The cor¬ 
responding bandwidth map obtained by our method. 


exploiting similarity across relevant depth patches in the 
same depth map [6]. The resolution can also be syntheti¬ 
cally increased by exploiting an existing generic database 
containing large numbers of high resolution depth patches 
[8] [11] which were inspired by the work in [3] [4]. The 
second approach upsamples the low resolution depth map 
by integrating multiple low resolution depth maps of the 
same scene, which may be acquired at the same time but 
from slightly different views [15] [17] or at the different 
sensing time [5]. These existing methods assume that the 
scene is static. The third category achieves upsampling 
through the supports of the high resolution guided color im¬ 
age [1][2][ ][ 12] [13] [21] [22] [23]. In this category, it is as¬ 
sumed that the depth discontinuity on the depth map and the 
color edge on the color image co-occur on the correspond¬ 
ing regions. 
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Image guided upsampling methods have several advan¬ 
tages against the other two categories and are popular in 
recent years. They can yield much better upsampling qual¬ 
ity than single depth map upsampling [6]. Besides, they can 
achieve larger upsampling factor and do not need any prior 
database compared with the existing methods [11][8]. Also 
when compared with the second category, image guided up- 
sampling is not subject to static scene and does not need 
complicated camera calibration process. Despite the obvi¬ 
ous advantages against the first two categories of the solu¬ 
tions, the main issues of image guided upsampling are: 1) 
texture copy artifacts on the smooth depth region when the 
corresponding color image is highly textured; 2) blurring 
depth discontinuity when the corresponding color image is 
more homogeneous; 3) performance drops for the case of 
highly noisy depth maps. 


In this paper, a new image guided upsampling approach 
is proposed. It well tackles the issues of the existing meth¬ 
ods in the same category mentioned above. We formulate 
a novel optimization framework with the brand new data 
term and the new smoothness term compared with other 
state-of-art models [1][2][13][21]. Pixel-by-patch validity 
checking is introduced in the data term in the optimization 
process instead of pixel-by-pixel checking in the existing 
methods. Also, a new error norm is proposed to replace 
the L2 norm. These together improve the robustness of the 
framework against the noise. Moreover, the new smooth¬ 
ness term relaxes the strict assumption on the co-occurrence 
of depth discontinuity and the color edge which is one of 
major problems in the existing methods of the same cate¬ 
gory. To further improve the performance of the proposed 
framework, we propose a new data driven parameter selec¬ 
tion scheme to adaptively estimate the parameter in the op¬ 
timization process. Experimental results on synthetic and 
real data show that our method outperforms other state-of- 
art methods in both visual quality and quantitative accuracy 
measurement. Encouraging performance retains even for 
larger upsamping scale such as 8x and 16 x. 


The rest of this paper is organized as follows: Section 2 
is the related work where we briefiy present the MRE frame¬ 
work in [ ] and its extension in [13] which are highly related 
to our work and then analyze their shortages which motivate 
the proposed method. In Section 3, we first detail our up- 
sampling model. Then, we give further explanation to show 
how the newly proposed model is able to handle the con¬ 
strains mentioned above. In the end, a data driven param¬ 
eter estimation scheme is proposed to adaptively select the 
parameter in the model. Section 4 shows our experimental 
results and the comparison with other state-of-art methods. 
We draw the conclusion in Section 5. 


2. Related work 


One of the most well known image guided upsampling 
methods is based on the Markov Random Eields (MRE) [1]. 
It is further extended in [13] which we denote as NLM- 
MRE. Both methods along with other similar methods such 
as [2] [7] [9] [10] [12] [21] [22] which adopt the certain cues 
on the corresponding available color image as the reference 
to upsample the low resolution depth map. They all assume 
that the color edge co-occurs with the depth discontinuity. 
To clearly demonstrate the essential constrains of the exist¬ 
ing methods, we brief the existing modeling as follows. 

Given a noisy low resolution depth map and the com¬ 
panion high resolution color image /, the task is to upsam¬ 
ple the to of which the resolution is the same as 
that of I. The work in [1] is modeled through MRE as^: 

=^Tginm{{l-a)^{Di- Y 
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( 1 ) 

where is the initial value of , which can be obtained 
by interpolating through the basic interpolation such as 
bicubic interpolation [1]. ft represents the coordinate of the 
high resolution depth map. N{i) is the neighborhoods of i 
in the square patch centered at i. The first term in Eq.(l) is 
the data term which enforces the similarity between the cor¬ 
responding positions of the high resolution depth map and 
the low resolution depth map. The second term in Eq.(l) is 
the smoothness term which enforces the smoothness in the 
neighboring area. The data term and the smoothness term 
are balanced by the parameter a. is defined as follows: 
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where C = {R^G,B} represents the different channels of 
the color image, ac is a constant defined by the user. 

This framework has the following two constrains: 

Eirst, it is sensitive to the noise in the depth map. To 
clearly show this constrain, we take the derivative of the ob¬ 
jective function in Eq.(l) with respect to D and let it equal 
to zero, then we can form an iterative formulation as: 
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It is seen that Eq.(3) has two important terms: the first 
term of the numerator is related to the data term in Eq.(l) 
and the second term in Eq.(3) is related to the smoothness 
term in Eq.(l). The denominator in Eq.(3) can be consid¬ 
ered as a constant for a given i G It is irrelevant to the 
depth change. It is therefore skipped in our analysis. The 

^We have slightly adjusted the original model to have a better compar¬ 
ison with our model. However, the mechanism remains the same. 
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Figure 2. (a) The synthetic depth map. (b) Its corresponding synthetic color image and (c) the noisy low resolution depth map (shown in 
bicubic interpolation). The result obtained by (d) the MRF in [1] (RMSE: 1.83), (e) its extension non-local means MRF in [13] (RMSE: 
2.07), (f) the image guided total generalized variation upsampling in [2] (RMSE: 1.69) and (g) our method (RMSE: 0.78). (d)~(f) cannot 
well smooth the noise and properly preserve the depth discontinuity once it does not corresponds to the color edge. 


first term contains only the initial value at the correspond¬ 
ing position. For a noise (including sensing noise and the 
noise caused by bicubic interpolation operation) free input, 
the initial value is close to the groundtruth. However, when 
the input contains significant noise, the initial value may be 
far away from the groundtruth. Simply adding this noisy 
initial value will greatly perturb the upsampling result. This 
implies that the data term in Eq.(l) is sensitive to the noise. 
From the data term in Eq.(l), it is seen that, during the up- 
sampling process, the validity and the quality of depth value 
at each depth position are measured pixel-by-pixel in order 
to enforce the similarity between the the corresponding po¬ 
sitions of the high resolution depth map and the low res¬ 
olution depth map. Also, L2 norm is applied to calculate 
the similarity. From the analysis above, these two factors 
together make the data term sensitive to the noise. 

Second, from the smoothness term in Eq.(l), it is seen 
that the assumption of the co-occurrence of the color edge 
and the depth discontinuity is unnecessarily enforced. Con¬ 
sequently, the weighting value in this term (see Eq.(3)) is 
determined by the color difference of the corresponding po¬ 
sitions on the color image. However, this assumption does 
not always well hold. Sometimes the locations with small 
depth difference on the depth map may contain large color 
difference on the color image. In this case, the obtained 
upsampled depth map may suffer from the texture copy ar¬ 
tifacts. Another case is that the positions with large depth 
difference may contain small color difference on the color 
image. In such case, the second term will be close to the 
mean of the depth value in A^(i) which will result in blur¬ 
ring depth discontinuities. 

The work in [13] extended the MRF by adding a more 
regularization term called the non-local means term. More¬ 
over, unlike the weight in Eq.(2) which is only based on the 
color image, they further combine segmentation, color in¬ 
formation, and edge saliency as well as the bicubic upsam¬ 
pled depth map to define the weight. Although the work in 
[13] somehow improves the performance when compared 
with [1], it does not upgrade these two terms significantly 
and thus the performance improvement is limited. Figure 
2 illustrates the 4 x upsampling results of a synthetic depth 


map^. In this synthetic experiment, we show that these two 
methods are not robust against the noise and cannot handle 
the case where the color edge and the depth discontinuity 
are not consistent. We will further illustrate the texture copy 
artifacts of these two methods in the experimental part. 

3. The method 

In this section, we describe our optimization framework 
for upsampling the noisy low resolution depth map to a 
higher resolution one given a companion color image. Dif¬ 
ferent from the previous work, we do not necessarily as¬ 
sume the co-occurrence of the color edge and the depth dis¬ 
continuity. Meanwhile, it is more robust against the noise. 

3.1. The upsampling model 

Our upsampling model consists of two terms: the data 
term and the smoothness term. Given a noisy low resolu¬ 
tion depth map , it is first interpolated to by bicubic 
interpolation. Then our upsampling model is formulated 
as: 


=argmin Ul - a)ED{D,D°) + aEs{D)} (4) 
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where Ed (D, is the data term that makes the result 
to be consistent with the input. Es {D) is the smoothness 
term that reflects prior knowledge of the smoothness of our 
solution. These two terms are balanced with the parameter 
a. 

The data term Ed (D, : the data term is defined as: 

En{D,D°)=Y, Y. (5) 


where uJij is a normalized Gaussian window that decreases 
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^The result of the non-local means MRF in [13] is obtained with the 
default parameters in their own implementation. 





where (Js is a constant that is defined by the user. (Pd{') 
is the robust error norm function that we denote as the ex¬ 
ponential error norm. This function has long been used in 
image denoising such as local mode filtering [19] and non¬ 
local filtering [14]. It is defined as: 

= 2A^ - exp j (7) 



Figure 3. Illustration of two error norm functions, (a) The L2 error 
norm, (b) The exponential error norm proposed in this paper. 


The proposed exponential error norm is illustrated in 
Figure 3(b) to have a comparison with the L2 norm illus¬ 
trated in Figure 3(a). 

Our data term is different from that of previous methods 
[1] [2] [13] [20] [21] which only use pixel-by-pixel depth dif¬ 
ference modeled with L2 error norm in the data term. In 
Eq.(4), it can be seen that the new data term measures the 
depth differences pixel-by-patch by taking each reference 
pixel’s neighboring region into account. According to [2], 
depth map is quite piece-wise smooth and thus pixel-by¬ 
patch difference calculation is more robust to the noise than 
pixel-by-pixel depth difference calculation. Such calcula¬ 
tion is further normalized by Gaussian window uJij in order 
to better maintain the local depth similarity in the neigh¬ 
boring area. Then, the pixel-by-patch depth difference is 
further modeled with the exponential error norm defined in 
Eq.(7) which is quite robust against the outliers [14]. These 
together make the data term robust against the noise and 
will be further theoretically explained in Section 3.2. 

The Smoothness term Es{D): the smoothness term is 
guided by the companion color image. It is defined as: 

Es{D) = J2 {\Di - Djf) (8) 

ienjeN(i) 

We adopt the same function in Eq.(7) to model the smooth¬ 
ness term, i.e. ips{') = The color guided weight 

ujij is defined as: 
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where uji^j is the same as Eq.(6) and is the same as 
Eq.(2). 

Note that the color guided weight in our model is quite 
similar as that of the MRE in [1] except that there is an ad¬ 
ditional Gaussian window uoij. However, the MRE in [1] 
enforces the co-occurrence of the color edge and the depth 
discontinuity through Eq.(2). Depth discontinuity cues are 
completely based on color image in their work (as seen in 
Eq.(3)). Instead, the smoothness term (i.e. Eq.(8)) in the 
proposed framework relaxes such strict dependence due to 
As we will further show in Section 3.2, the smooth¬ 
ness term will result in a color image guided bilateral filter 
on the depth map at each update. Depth discontinuity cues 
are not only based on the color image but also based on the 
depth map itself during the optimization process. It is more 
capable of the case where the color edge and the depth dis¬ 
continuity are not consistent with each other. This property 


will be the key element that makes our model tackle the tex¬ 
ture copy artifacts and blur of depth discontinuities. We will 
further theoretically explain these in Section 3.2. 


3.2. Further analysis of our model 

We further analyze our upsampling model in order to 
demonstrate its advantages. By taking the derivative of the 
objective function of our model in Eq.(4) with respect to D, 
let it equal to zero and form the iterative formulation as: 
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where we define 

dl, = - Dj\"), sIj = - D7\")^ 

= exp 

s is iEe derivative of cpn = 
(fs {x‘^) defined in Eq.(7). 

Our method has the following advantages: 

Eirst, it is more robust against the noise in the input. The 
first term is the weighted sum of the initial depth values in 
N{i). The weights are based on the difference of the latest 
updated depth map and the initial depth map, which is fur¬ 
ther normalized by the Gaussian window uJij. This term is 
related to the data term in the proposed framework. Com¬ 
pared with Eq.(3), it can be shown that, in the newly pro¬ 
posed data term, the accuracy of upsampling in each round 
at each pixel is not affected by the initial upsampling value 
of the reference pixel only. Instead, the accuracy of upsam¬ 
pling in each round is based on the local measurement in 
its neighboring area N{i). This results from the proposed 
pixel-by-patch depth difference measurement and is more 
robust against the noise. This effort is further enhanced by 
introducing dfj which results from the proposed exponen¬ 
tial error norm. When the original depth map contains sig¬ 
nificant noise, the depth value in the neighboring area is 
unlikely smooth which results in large \Df — This 

further causes the decrease of d^. Einally, such noisy area 
will have less effects on Df in each round of upsampling. 
Consequently, such data term is more robust to the noise. 
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To our best understanding, it is a brand new data term used 
in the optimization framework for image guided depth up- 
sampling. 

The second term in the numerator in Eq.(lO) is related 
to the smoothness term. Compared with the counterpart 
in Eq.(3), it is shown that the existing methods enforce the 
assumption on the consistence between the color edge and 
the depth discontinuity, where the weight uji^j is only deter¬ 
mined by the color difference in the neighboring area on the 
color image. In Eq.(lO), it is shown that we relax such strict 
assumption by extending color image guided bilateral fil¬ 
tering onto each round of upsampling depth map . The 
weight value is determined by three factors: 1) the color 
similarity in the local area (same as the existing method); 2) 
the distance between the reference pixel and its neighboring 
pixel (i.e. uJi^j)\ 3) the difference of depth value between the 
reference pixel and its neighboring pixels (i.e. sfj re¬ 
flects the property of the depth map. Eor the case where the 
depth region is homogeneous but the color information is 
not smooth at the corresponding area, can eliminate the 
efforts caused by Thus, it reduces the texture copy ar¬ 
tifacts. Eor the case where depth region contains depth dis¬ 
continuity but the color is smooth at the corresponding area 
on the color image, the second term will be close to a bi¬ 
lateral filter where well preserves the depth discontinu¬ 
ity. Thus the proposed smoothness term is more capable of 
cases where the color edge is not consistent with the depth 
discontinuity. We do not assume the color edge and the 
depth discontinuity to be necessarily consistent with each 
other. 

Eigure 2(g) shows the result by our method. It is clear 
that the noise in the homogeneous regions have been well 
smoothed and the depth discontinuities can be properly pre¬ 
served even there is no color edge corresponding to the 
depth discontinuity on the depth map. 

3.3. Data driven adaptive bandwidth selection 

In this paper, the parameter A is denoted as the band¬ 
width of the exponential error norm function in Eq.(7) 
which is similar to the function of the tone term in the bilat¬ 
eral filter [ 1 8] . A large A has better noise smoothing but may 
over smooth the depth discontinuities. A small A can bet¬ 
ter preserve the depth discontinuities but performs poorly in 
noise smoothing. In this section, we describe an optimiza¬ 
tion method that adapts A to each pixel on the depth map. It 
is a data driven adaptive bandwidth selection. Because the 
depth map is quite piece wise smooth, we assume that the 
bandwidth is also regular and smooth. We add another reg¬ 
ularization term that consists of the L2 norm of the gradient 
of Xi{i G ft) to the objective function in Eq.(4). That is: 
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i G O 

( 12 ) 


By minimizing Eq.(12) with respect to A^ through the 
steepest gradient descent according to: 
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where r is the given updating rate and the derivative of the 
objective function is given by 
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where dfj and sfj are the same as Eq.(ll). 

Depth map upsampling and the bandwidth selection 
are addressed in an iterative way through alternating the 
bandwidth update in Eq.(13) and the depth map update in 
Eq.(lO). Note that most components in Eq.(14) have been 
already computed in Eq.(lO). The computation cost for A A 
is also quite small. So the computation cost of the band¬ 
width selection step is quite small indeed. 

Eigure 1(f) illustrates a bandwidth map obtained by our 
method where dark regions correspond to smaller band¬ 
width values and bright regions correspond to larger band¬ 
width values. It is clear that this bandwidth map well cor¬ 
responds to the character of the depth map shown in Eigure 
1(b). 


4. Experiments 

In this section, we perform the quantitative and qualita¬ 
tive evaluation of our method on both the synthetic data and 
real data. Comparison are performed with other state-of- 
art methods where the source codes are available. We show 
that our method can outperform other methods in most cases 
both quantitatively and qualitatively. Eor more experimen¬ 
tal results, please see our supplementary material. 

4.1. Experiments on the synthetic data 

We first test our method on the synthetic data form 
the Middleburry dataset [16]. We reuse the data in [21] 
and compare our method with the MRE in [1] and its ex¬ 
tension non-local means MRE in [13], the image guided 
anisotropic total generalized variation upsampling in [2], 
the joint geodesic upsampling in [9], the color guided auto¬ 
regression upsampling in [21] and the cross-based local 
multipoint filter in [10]. The upsamling results are evalu¬ 
ated in root mean square error (RMSE) between the origi¬ 
nal depth map and the upsampling result. Eour upsampling 
factors are tested: 2x/4x/8x/16x. All the values 








of both the color image and the depth map are normalized 
into interval [0,1] for convenience. However, the RMSE is 
still reported in terms that all the values of the depth map 
are in interval [0, 255]. The parameters are set as follows: 
we slightly tune a for different upsampling factors which is 
also the strategy adopted by other methods such as [2] [21]. 
It is chosen as 0.8/0.9/0.96/0.99 for2x/4x/8x/16x up- 
sampling. Perturbations on the other parameters marginally 
affect the final results. P in Eq.(12) is set to 0.3. The neigh¬ 
borhood N{i) is chosen as a 19 x 19 square patch. cFq and 
(Jcin Eq.(9) is set to 9 and ^ respectively. The initial value 
of A in Eq.(13) is set to ^ for alH G fl. Its updating rate 
r in Eq.(13) is 0.3. 

Table 1 summarizes the quantitative comparison results 
on the Middleburry dataset [16]. It is clear that our method 
outperforms other methods in almost all the cases. Espe¬ 
cially, results by our method have a great improvement over 
the ones by the MRE in [1] and the non-local means MRE in 
[13]. Even when compared with the most recent proposed 
methods in [2] and [21], our method have smaller RMSE 
in almost all the cases. Note that the method in [2] needs 
thousands of iterations to converge which is quite time con¬ 
suming, namely 10,000 iterations for 2x and 4x upsam¬ 
pling, 15, 000 iterations for 8x upsampling and 30, 000 it¬ 
erations for 16 X upsampling which needs hours with their 
own implementation^. However, our method only needs a 
few iterations to converge, for example, less than 5 itera¬ 
tions for 2 X upsampling and less than 50 iterations for 16 x 
upsampling which only needs very few minutes. The com¬ 
putation cost of the method in [21] is also quite huge due to 
its complex shape-based color guided weights. Our method 
is also a magnitude faster than theirs. Also, our computa¬ 
tional cost doubles the MRE in [1] but is about half of the 
non-local MRE in [13]. 

Eigure 4 illustrates the improvement of upsampling re¬ 
sults by our bandwidth selection. Without bandwidth se¬ 
lection, it is quite easy to smooth the depth discontinuities 
where the corresponding color edges are weak. Moreover, 
our bandwidth selection can further help to preserve the fine 
details in the depth map. Eigure 1(e) clearly illustrates the 
improvement. The small holes in the ring are properly pre¬ 
served with the adaptive bandwidth selection while this can¬ 
not be achieved without the adaptive bandwidth selection as 
shown in Eigure 1(d). 

Eigure 5 shows the examples of 8 x upsampling for vi¬ 
sual comparison. It is clear that results by the MRE in [ 1 ] 
and the non-local means MRE in [13] cannot well smooth 
the noise in homogeneous regions and properly preserve 
the depth discontinuities. As shown in the highlighted re¬ 
gion of the Moebius, results by their methods suffer from 


^we perform the comparison with their own MATLAB implementation 
and all the parameters recommended by the author. The computational 
time reported in their paper was implemented with GPU acceleration 
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Figure 4. Visual comparison of our method for 8x upsampling 
with and without bandwidth selection. The first row are patches 
from Dolls and the second row are patches from Reindeer, (a) 
The groundtruth. (b) The corresponding color image, (c) Results 
obtained without bandwidth selection, (d) Results obtained with 
bandwidth selection and (e) the corresponding bandwidth maps. 


heavy texture copy effect while ours is not. The method 
in [2] can avoid the texture copy effect, but it also cannot 
preserve the fine details, for example, the small holes on 
the ring of Art and the small object of Moebius which are 
shown in the highlighted regions. However, our method can 
not only well smooth the noise in the homogeneous regions 
and avoid texture copy but also properly preserve the depth 
discontinuities even the fine details. Eigure 6 further shows 
16 X upsampling results comparison. It is clear that even for 
such a large upsampling factor, our method can still well 
smooth the noise and preserve sharp depth discontinuities 
while none of the compared methods could yield such per¬ 
formance as clearly shown in the figure. 

4.2. Experiments on the real data 

To test the effectiveness of our method on real ToE depth 
maps, we further test our method on the real ToE depth maps 
from [2] . As far as we know, this is the only real ToE dataset 
that provides groundtruth. Three depth maps are included in 
this dataset, namely Books, Devil and Shark. All the values 
in the depth map are the real depth values from the camera 
to the measured objects (in mm). The upsampling factor is 
about 6.25 X. Table 2 summarizes the quantitative compar¬ 
ison on this real dataset. Our method outperforms all the 
other methods on these three depth maps. Eigure 7 illus¬ 
trates the visual comparison of the Books upsampling re¬ 
sults. Note that the result of the MRE [ ] has strong texture 
copy effect at the up left comer of the book. The result of 
the non-local means MRE [13] still contains noise in the ho¬ 
mogeneous regions. Our method can well smooth the noise 
in homogeneous regions and preserve sharp depth disconti¬ 
nuities. 

5. Conclusion 

In this paper, we propose a novel method for depth map 
upsampling guided by the high resolution color image. We 





Table 1. Quantitative comparison on the synthetic data from the Middlenurry dataset [16]. Four upsampling factors are tested, i.e. 2 x /4 x 


/8 X /16x. The results are evaluated in RMSE and the best results are in bold. 



Art 

Book 

Dolls 

Laundry 

Moebius 

Reindeer 

2x 

4x 

8x 

16x 

2x 

4x 

8x 

16x 

2x 

4x 

8x 

16x 

2x 

4x 

8x 

16x 

2x 

4x 

8x 

16x 

2x 

4x 

8x 

16x 

CLMF[10] 

1.19 

1.77 

2.95 

4.91 

0.9 

1.48 

2.38 

3.36 

0.87 

1.44 

2.32 

3.3 

0.96 

1.56 

2.54 

3.85 

0.94 

1.55 

2.5 

3.81 

0.96 

1.54 

2.37 

3.25 

JGF [9] 

2.36 

2.74 

3.64 

5.46 

2.12 

2.25 

2.49 

3.25 

2.09 

2.24 

2.56 

3.28 

2.18 

2.4 

2.89 

3.94 

2.16 

2.37 

2.85 

3.9 

2.09 

2.22 

2.49 

3.25 

MRF[1] 

1.24 

1.69 

2.51 

3.99 

0.74 

1.04 

1.53 

2.3 

0.75 

1.04 

1.5 

2.19 

0.78 

1.12 

1.67 

2.73 

0.79 

1.08 

1.57 

2.33 

0.83 

1.11 

1.65 

2.57 

NLM-MRF[13] 

1.66 

2.47 

3.44 

5.55 

1.19 

1.47 

2.06 

3.1 

1.19 

1.56 

2.15 

3.04 

1.34 

1.73 

2.41 

3.85 

1.2 

1.5 

2.13 

2.95 

1.26 

1.65 

2.46 

3.66 

TGV [2] 

0.8 

1.21 

2.01 

4.59 

0.61 

0.88 

1.21 

2.19 

0.66 

0.96 

1.38 

2.88 

0.61 

0.87 

1.36 

3.06 

0.57 

0.77 

1.23 

2.74 

0.61 

0.85 

1.3 

3.41 

AR [21] 

0.92 

1.23 

2.1 

3.9 

0.74 

0.85 

1.23 

1.96 

0.8 

0.97 

1.35 

2.24 

0.73 

0.93 

1.34 

2.24 

0.72 

0.82 

1.25 

2.16 

0.77 

0.87 

1.3 

2.73 

Ours 

0.69 

1.14 

1.89 

3.26 

0.56 

0.8 

1.16 

1.72 

0.63 

0.88 

1.23 

1.78 

0.6 

0.84 

1.22 

2.13 

0.55 

0.78 

1.14 

1.74 

0.58 

0.84 

1.27 

2.13 


model the upsampling work with an optimization frame¬ 
work that consists of a brand new data term and a new 
smoothness term. The proposed data term is based on the 
pixel-patch difference and is modeled with an exponential 
error norm function. It has been proved to be more robust 
against the noise than the one based on pixel-pixel differ¬ 
ence with L2 norm as the error norm function. We relax 
the too strict assumption on the consistency between the 
color edge and the depth discontinuity which is adopted by 
most existing methods. The new smoothness term makes 
our model not only obtain the depth discontinuity cues from 
the guided color image but also the depth map itself. This 
makes our model well tackle the texture copy artifacts and 
preserve sharp depth discontinuities even when there is no 
color edge correspondence. Moreover, a data driven scheme 
is proposed to adaptively select the proper bandwidth of the 
exponential error norm function. This helps to further im¬ 
prove the upsampling quality where fine details and sharp 
depth discontinuities could be preserved even for a large 
upsampling factor, 8x and 16 x for example. Experimental 
results on both synthetic data and real data have shown our 
method outperforms other state-of-art methods. 
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Table 2. Quantitative comparison on real data from [2]. The error is calculated as RMSE to the measured groundtruth in mm. The best 
results are in bold. _ 



Bicubic 

CLMF [10] 

JGF [9] 

JBF[ ] 

MRF[1] 

NLM-MRF [13] 

TGV [2] 

AR [21] 

Ours 

Books 

Devil 

Shark 

16.23mm 

17.78mm 

16.66mm 

13.89mm 

14.55mm 

15.1mm 

17.39mm 

19.02mm 

18.17mm 

15.42mm 

16.47mm 

17.16mm 

13.87mm 

15.36mm 

16.07mm 

14.31mm 

15.36mm 

15.88mm 

12.8mm 

14.97mm 

15.53mm 

13.28mm 

14.73mm 

15.86mm 

12.58mm 

14.28mm 

14.74mm 



Figure 5. Visual comparison of 8x upsampling results of Art and Moebius from the Middleburry dataset [16]. (a) The input depth maps 
(in red boxes) and the corresponding color images, (b) The groundtruth depth maps. The results obtained by (c) the MRF in [! ], (d) its 
extension non-local means MRF in [13], (e) the image guided total generalized variation upsampling in [2], (f) our method and (g) the 
corresponding bandwidth maps by our bandwidth selection. The regions in red boxes which contain fine details are highlighted. 



Figure 6. Visual comparison of 16x upsampling results of Art from the Middleburry dataset [16]. (a) The bicubic interpolated depth map, 
and results by (b) the joint geodesic upsampling in [9], (c) the MRF in [1], (d) the non-local means MRF in [13], (e) the image guided 
generalized total variation upsampling in [2] (f) the color guided auto-regression upsampling in [21] and (g) our method. 



Figure 7. Visual comparison on the real data Books from [2]. The first row are (a) the measured groundtruth and results by (b) bicubic 
interpolation, (c) the cross-based local multipoint filter in [10], (d) the joint geodesic upsampling in [9], (e) the joint bilateral filter in [7], 
(f) the MRF in [1], (g) the non-local means MRF in [13], (h) the image guided total generalized variation upsampling in [2] and (i) our 
method. The intensity image together with the input depth map (in the red box) and corresponding error maps are shown in the second row. 

















